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2.3 Minimal sufficiency and the Lehmann-Scheffe property. If a statistic T, for 
example a real-valued statistic, is sufficient for a family V of laws, then for any other 
statistic [/, say with values in M?' , the statistic (T, U) with values in M*^+^ is also sufficient. 
In terms of cr-algebras, if the family V is defined on a cr-algebra B and a sub-cr-algebra A 
is sufficient for "P, then any other cr-algebra C with A d C d B is also sufficient. But since 
the idea of sufficiency is data reduction, one would like to have a sufficient cr-algebra as 
small as possible, or a sufficient statistic of dimension as small as possible. 

A cr-algebra A will be called minimal sufficient for V if it is sufficient and for any 
sufficient cr-algebra C, and each A & A, there is a C G C such that 1^ = Ic a.s. for each 
P d V. So, A is included in C up to almost sure equality of sets. Then, a statistic T 
with values in a measurable space {Y,J^) will be called minimal sufficient iff T~^{J^) is a 
minimal sufficient cr-algebra. 

Example. Let V he a family of symmetric laws on M, such as the set of all normal laws 
A^(0,cr^), cr > 0. Considering n = 1 for simplicity, the identity function x is (always) a 
sufficient statistic, but it is not minimal sufficient in this case, where |a;| is also sufficient. 

For dominated families, a minimal sufficient cr-algebra always exists: 

2.3.1 Theorem (Bahadur). Let be a family of laws on a measurable space {S,B), 
dominated by a cr-finite measure /i. Then there is always a minimal sufficient cr-algebra A 
for V. Also, there is such a cr-algebra A containing all sets B in B for which P{B) = for 
all P e and such an A is unique. 

Proof. Take a law v equivalent to V from Lemma 2.1.6(d). Choose densities dP/du for all 
P eV and let A be the smallest cr-algebra for which all the dP/ dv are measurable. Then 
by Theorem 2.1.4, A is sufficient. Next, let C be any sufficient cr-algebra for V. Let Ai be 
the collection of sets ^ in ^ for which there exists a C e C with Ic- = 1^ a.s. for every 
P E V. Then Ai is a cr-algebra, since if 1a = Ic a-S- for all P E V, the same is true for 
the complements, with 1 — l^i = 1 — Ic, and if 1^0) = lc(j) for all P the same is true 
for the union of the sequences A{j) and C{j). By the proof of Theorem 2.1.4, (c) implies 
(b), each dP/df must equal a C-measurable function a.s. {y). Thus the sets {dP/df > t} 
for each P e V and real number t are in Ai . Since these sets generate A (RAP, Theorem 
4.1.6), Ai = A and A is minimal sufficient. 

By choice of z/, the collection Z of sets B (in B) with P{B) = for all P E V is the 
same as {B G B : i'{B) = 0}. The cr-algebra y generated by Z and A is easily seen to be 
minimal sufficient. If we start with any other minimal sufficient cr-algebra C in place of A, 
it follows easily from the minimal sufficiency of both A and C that the resulting y will be 
the same. So y is uniquely determined. □ 

The cr-algebra 3^ just treated may be called ''the minimal sufficient cr-algebra," al- 
though as a collection of sets it is actually the largest of all minimal sufficient cr-algebras. 

An idea closely related to minimal sufficiency is the Lehmann-Scheffe property, as 
follows: 
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Definition. Given a collection V of laws on a measurable space {S,B), a sub-cr-algebra 

A G B will be called a Lehmann-Schejfe (LS) cr-algebra for V iff whenever / is an A- 
measurablc function with / fdP = for all P G V, we have / = a.s. for all P E V. 
A statistic will be called an LS statistic for V iff the smallest cj-algebra for which it is 
measurable is LS for V. 

Lehmann and Scheffe called cr-algebras satisfying their property complete. This is 
different from the notion of complete class of decision rules. Also, in measure theory, a 
cr-algebra <S may be called complete for a measure ji if it contains all subsets of sets of 
iU-measure 0. The Lehmann- Scheffe property is, evidently, quite different. So, it seemed 
appropriate to name it here after its authors. It is equivalent to uniqueness of ^-measurable 
unbiased estimators: 

2.3.2 Theorem. A sub-a-algebra A is LS for V if and only if for every real-valued function 
g onV having an unbiased .4-measurable estimator, the estimator is unique up to equality 
a.s. for all P eV. 

Proof. The constant function always trivially has an unbiased estimator by the statistic 
which is identically (and so measurable for any A). Uniqueness of this estimator up to 
equality a.s. for all P E V yields the definition of the LS property. Conversely if A is LS 
for V, suppose T and U are both ^-measurable and both unbiased estimators of a function 
g on V. Then T - U has integral for aU P e so T - C/ = a.s. and T = f/ a.s. for all 
P eV. □ 

Some (T-algebras are LS just because they are small. For example, the trivial cr-algebra 
{0, 5"} is always LS. For any measurable set A, the cr-algebra {0, A, A^, S} is LS for V unless 
P{A) is the same for all P in V. So LS cr-algebras will be interesting only when they are 
large enough. One useful measure of being large enough is sufficiency. If a function g on 
V has an unbiased estimator U and .4. is a sufficient cr-algebra, then T = Ep{U\A), which 
doesn't depend on P G P by Theorem 2.1.8, is an unbiased, ^-measurable estimator as in 
Corollary 2.2.3 and Theorem 2.3.2. 

From here on, the LS property will be considered for sufficient cr-algebras. These must 
be minimal sufficient: 

2.3.3 Theorem. For any collection V of laws on a measurable space {S,B), any LS, 
sufficient cr-algebra C is minimal sufficient. 

Proof. If not, there is a sufficient cr-algebra A and a set C e C such that there is no 
set ^ in ^ for which Ic = I a a.s. for all P e V. Let / := Ep{lc\A) for all P e 
by Theorem 2.1.8. For some P G P, / is not equal to Ic a.s. (P), otherwise letting 
A = {f — 1} would give a contradiction. We have /(Ic "~ f)fdP = 0, as can be seen 
by taking the conditional expectation of the integrand with respect to A and bringing / 
outside the conditional expectation by Lemma 2.1.1 as in the proof of Theorem 2.1.4; or, 
see RAP, Theorem 10.2.9 (conditional expectation is an orthogonal projection in L^). It 
follows from this orthogonality that 

P(C) = J lldP = J{lc-ffdP + J fdP> J fdP 
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Let g := Ep{f\C). This actually doesn't depend on P since C is also sufficient. Then 
Ic — g is a, C-measurable function whose integral over all of 5" is for every P &V since S 
is in both A and C. It is not possible that Ic = g a.s., since / g^dP < J f^dP < P{C). 
This contradicts the fact that C is LS. □ 

From Theorems 2.3.1 and 2.3.3 we see that for a dominated family V there is an LS 
sufficient cr-algebra if and only if the minimal sufficient cr-algebra is LS. It may not be: 

2.3.4 Proposition. There exists a density / on M and n such that if Pg has density 
X ^ f{x — ^) for a; e M and ^ e M, the family of laws Pq, ^ G M, has the n-tuple of order 
statistics (-^(i), • • • ,^{n)) as a minimal sufficient statistic which is not LS. 

Proof. For any densities /(a;, 6'), the density for n i.i.d. variables is 

so the order statistics are sufficient. Take the Cauchy density f{x) := l/(7r(l + x'^)) as 
f{x) for all x G M. For any n, we can take a measure v equivalent to {Pg : ^ G R} as Pq. 
Then the density of Pq with respect to v is 

n-^i(i + xf^.))/(i + (X(,)-0)2), 

the reciprocal of a polynomial J{0) of degree 2n in 9. By the proof of Theorem 2.3.1, a 
minimal sufficient u-algebra is the smallest cr-algebra making these functions measurable in 
Xi, . . . , Xn for each 9. The coefficients of the 0th through 2nth powers of 9 are determined 
by the values of J at any 2n + 1 values of 9, say ^ = 0, 1, . . . , 2n, and are linear and thus 
Borel measurable functions of these values, and so are ^-measurable. The roots of J 
are the complex numbers X(^j^ ± i for j — 1, . . . ,n. The order statistics are Borel 
measurable functions of the coefficients, and so are also ^-measurable, as follows: first, 
is measurable because it is the limit of the sequence tk where tk is the infimum of 
rational numbers r such that | J(r-|-i)| < 1/k. Recall that for any non-constant polynomial 
J, — oo as 1^1 — s> oo, so I J| only has arbitrarily small values near roots of J. Once 

we have found -^(i), we can divide the polynomial J by 1 -|- (-^(i) — 9)"^ to get a new 
polynomial of degree 2n — 2 in 9 whose coefficients are ^-measurable, and iterate to get 
that all the order statistics , . . . , -^(n) are ^-measurable, so the order statistics are 
indeed minimal sufficient. 

Now for n > 4, — -^(2)) is finite by Problem 5(c) and is a constant Cn not 

depending on 9, so X(n-i) ~X[2) ~Cn is a non-zero .A-measurable function with expectation 
for all 9, so A is not LS. □ 

2.3.5 Theorem. If A is an LS, sufficient cr-algebra for V and (7 is a real-valued function 
on V for which an unbiased estimator exists, then there is an ^-measurable unbiased 
estimator T for g, unique up to almost sure equality for all P E V, and which attains the 
minimum possible risk for unbiased estimators simultaneously for all convex loss functions 
and all PeV. 

Proof. Let U be any unbiased estimator of g and T = E{U\A), which doesn't depend 
on P G by Theorem 2.1.8. Then T is an ^-measurable estimator and is unbiased for g 
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since EpEp{U\A) = EpU for each P from the definition of conditional expectation. T is 
unique with these properties up to almost sure equality by Theorem 2.3.2. Here T could 
have been obtained as for any other unbiased estimator V , and so by Corollary 

2.2.3, T attains the minimum possible risk for unbiased estimators for any P e P and any 
convex loss function. □ 

In the example in Prop. 2.3.4 for n = 6, ^2,5 := (-^(2) + -^(5))/2 and X3 4 := 
(X(3) +X(4))/2 are two different unbiased estimators of ^, both measurable for the minimal 
sufficient cr-algebra, which is not LS. An unbiased estimator measurable for a minimal 
sufficient cx-algebra doesn't necessarily attain minimum risk for such estimators for any 9. 
In fact, ^2,5 has infinite risk for squared-error loss, while ^3^4 has finite risk, by Problem 
5(d). In such cases there is no LS sufficient u-algebra A since by Theorem 2.3.3, A would be 
minimal and by Theorem 2.3.1, for a family dominated by a cr-finite measure, the minimal 
sufficient cr-algebra is essentially unique. 

Theorem 2.2.2 (the Rao-Blackwell theorem) shows that estimators become no worse 
for convex loss functions by conditioning (taking conditional expectation) with respect to a 
sufficient cr-algebra, but it is not clear when conditioning makes estimators strictly better. 
For that we need the notion of strict convexity. A function / from a convex set C into M 
is called strictly convex if for any x ^ y in C and < tt < 1 we have 

f{ux + {l-u)y) < uf{x) + {l-u)f{y). 

Thus f{x) := x^ is strictly convex but fix) := \x\ is not, specifically when x and y have 
the same sign. 

2.3.6 Theorem. Assume given a decision problem where the action space A is a convex 
Borel subset of some M.^ and the loss function W{P, •) is strictly convex on A and Borel 
measurable, for each P eV. Let .4. be a sufficient cr-algebra for V and U a decision rule 
such that / ||C/||dP < 00 for all P e P and such that for some P e P, U \s not equal 
P-almost surely to an ^-measurable function. Then for such a P, T := Ep{U\A).i if it 
has finite risk r(P, T), has strictly smaller risk for P and W than U has. 

Proof. Recall that Eq{U\A) doesn't depend on Q e P by Theorem 2.1.8. We wiU first 
have: 

2.3.7 Theorem. Let C be a convex Borel set in some M.^. Let (O, A, P) be a probability 
space and X a random variable on Q with values in C, E\X\ < 00 and such that X is not 
a constant a.s.. Let / be a strictly convex, Borel measurable real- valued function on C, 
such that E\f{X)\ < 00. Then EX e C, 

(a) Ef{X)> f (EX), and 

(b) If C is any sub-cr-algebra of A, then E{X\C) G C a.s., and if X is not equal almost 
everywhere to a C-measurable function, then E{f{X)\C) > f{E{X\C)) with probability 
> 0. 

Proof. From Jensen's inequality (RAP, Theorem 10.2.6) and its proof, EX G C and 
there is a constant c and a linear function g such that f{x) > c — g{x) for all a; G C and 
f{EX) = c - g{EX). IiyeC,y^EX and f{y) = c - g{y), then x := {y + EX)/2 G C 
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and f{x) < if {EX) + f{y))/2 = c - g{x), a contradiction. So f{y) > c - g{y). Thus if 
X is not constant a.s., Ef{X) > c — Eg{X) = f{EX). So (a), a strict form of Jensen's 
inequality, holds. 

Then, the conditional Jensen inequality (RAP, Theorem 10.2.7) and its proof apply 
since a Borel set in a Polish space (complete separable metric space, here R'^) is Borel- 
isomorphic to a Polish space (RAP, Theorem 13.1.1), so regular conditional probabilities 
for X given C exist (RAP, Theorem 10.2.2). If X is not equal a.s. to a C-measurable 
function, specifically to E{X\C), then with positive probability, the regular conditional 
probabilities are not concentrated at single points. Since the conditional expectation can 
be defined by integrating with respect to regular conditional probabilities (RAP, Theorem 
10.2.5), it follows from part (a) that E{f{X)\C) > f{E{X\C)) with positive probability. □ 

Now, the conclusion of Theorem 2.3.6 clearly holds if r(P, U) = +oo, or if r(P, U) < oo, 
it follows directly from Theorem 2.3.7. □ 

2.3.8 Corollary. Under the conditions of Theorem 2.3.6, if there exists a decision rule U 
with J \\U\\dQ < oo and r{Q, U) < oo for all Q eV, then the .4-measurable decision rules, 
and those equal to them a.s. for all P, form a complete class. 

Note that Theorem 2.3.6 and Corollary 2.3.8 are not limited to unbiased estimators or 
to estimators at all; they hold for general decision rules. Also, E{U\A) improves on U not 
only for an individual loss function but simultaneously for all convex loss functions. By 
taking a minimal sufficient cr-algebra A, as is possible for dominated families by Theorem 
2.3.1, under the hypotheses of Corollary 2.3.8, we get a complete class of decision rules for 
strictly convex loss functions. In other words, decision rules which are not .4- measurable 
(up to almost sure equality for all P E V) are inadmissible. By Theorem 2.3.3 an LS, 
sufficient cr-algebra will be minimal sufficient, so that if a statistic T is LS and sufficient 
and is an unbiased estimator of a function g{0), it is optimal among unbiased estimators, 
while among general estimators, we can limit the choice to functions of T. 

PROBLEMS 

1. If P = {Pi, ... , P/s} is a finite set of laws on (X, B) and all Pj are absolutely continuous 
with respect to Pi, show that x ^ {dP2/dPi, . . . ,dPk/dPi){x) is a minimal sufficient 

statistic for V. 

2. Let U be the set of all uniform distributions on intervals [a, 6] C M for a < b. In Sec. 
2.1, Problem 4 was to show that for n i.i.d. observations from a distribution in U, the 
smallest and largest order statistics (X(i),X(^)) form a sufficient statistic (with values 
in R^). Show that this statistic is minimal sufficient. 

3. For a fixed h > 0, let Uh be the set of all uniform distributions on intervals [6,9 + h], 

(a) Show that (X(i),X(„)) is also minimal sufficient for this family. 

(b) Show that this minimal sufficient statistic is not LS: give two different unbiased 
estimators for 9 which are both functions of (X(i), X(„)). Hint: If Xi, X^ are i.i.d. 
U[0,1], then P(X(^) < x) = a;" for < a; < 1. Thus has density nx""-^ for 
< a; < 1 and EX^^) = n/{n + 1). Likewise, -EX(i) = l/(n + 1). 
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4. Show that for the family A'"(0, cr^), u > 0, on M, for n — 1 (the example before Theorem 
2.3.10), \x\ is a minimal sufficient statistic and for n i.i.d. variables Xi,... a 
minimal sufficient statistic is Xf + ■ ■ ■ + X^. 

5. (a) If X has the standard Cauchy density l/[7r(l + x^)] for — oo < a; < oo, show that 
P{X > a;) ~ l/{7rx) as x ^ +oo, where f ^ g means f /g — > 1. 

(b) . If Xi, Xn are i.i.d. standard Cauchy, show that for each A; = 1, n, < 
-x) = P(X(„+i_fc) > a;) ~ {l)/{Trx)'' as x ^ +oo. 

(c) In part (b), show that E\X(^^.-)\ < oo if and only if 1 < A; < n. Hint: for a 
random variable Y > with density g, EY = yg{y)dy = g{y) Jq dtdy = 

jr9{y)dydt = j^p{Y>t)dt. 

(d) Similarly, show that E{X'^j^^) < co if and only if 3 < /c < n — 2. 

NOTES 

The Lehmann-Scheffe property, which they called completeness, is due to Lehmann 
and Scheffe (1950, 1955,1956). 
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